On cherche à étudier l’effet de trois facteurs sur le transcriptome des racines d’Arabidopsis. On cherche ici à utiliser et comparer différents packages de multiDE sur nos données.

Import des données

            R3   R2   R1   R5  R4   R6  R7  R11  R10  R12  R13  R17  R16  R14
AT1G01010 1526 1006 1116 1275 967 1018 854 1132 1294 1364 2325 2113 2193 2564
AT1G01020  416  285  289  349 364  370 226  513  502  561  461  407  432  614
AT1G01030   31   15   19   29  36   28  12   47   34   47   18   40   32   44
           R15  R18  R24  R19  R22  R23  R21  R20  R9  R8
AT1G01010 2364 2074 1987 2027 1754 1697 1537 1898 816 912
AT1G01020  380  502  484  467  426  415  413  462 223 312
AT1G01030   37   27   42   39   36   40   37   37  15  19
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
[1] 28775    24
[1] "cnF_3"
[1] At_AmbientCO2_LowNitrate_Fe1
48 Levels: At_AmbientCO2_HighNitrate_Fe1 ... Sl_ElevatedCO2_LowNitrate_FeStarvation3

TCC (Tag Count Comparison)

TCC is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq.

On crée l’objet TCC avec le design souhaité, et on filtre les gènes avec de faibles expressions (paramètre low.count).

Lors de la normalisation (DEGES,iedgeR), on fait un premier calcul des gènes DE, pour pouvoir les enlever lors de la normalisation. Le maramètre test.method permet de choisir la manière de détecter les genes DE (edgeR, DEsqe2, ou tBt (très long)). On peut répéter cette procédure jusqu’à la convergence des facteurs de taille des librairies, d’ou le i.

Count:
          cNF_3 cNF_2 cNF_1 cnF_2 cnF_1 cnF_3 CNF_1 CnF_2 CnF_1 CnF_3 cNf_1
AT1G01010  1526  1006  1116  1275   967  1018   854  1132  1294  1364  2325
AT1G01020   416   285   289   349   364   370   226   513   502   561   461
AT1G01030    31    15    19    29    36    28    12    47    34    47    18
          cnf_2 cnf_1 cNf_2 cNf_3 cnf_3 Cnf_3 CNf_1 Cnf_1 Cnf_2 CNf_3 CNf_2
AT1G01010  2113  2193  2564  2364  2074  1987  2027  1754  1697  1537  1898
AT1G01020   407   432   614   380   502   484   467   426   415   413   462
AT1G01030    40    32    44    37    27    42    39    36    40    37    37
          CNF_3 CNF_2
AT1G01010   816   912
AT1G01020   223   312
AT1G01030    15    19
 [ reached getOption("max.print") -- omitted 3 rows ]

Sample:
                                        group norm.factors lib.sizes
cNF_3            At_AmbientCO2_HighNitrate_Fe            1  32032772
cNF_2            At_AmbientCO2_HighNitrate_Fe            1  25708325
cNF_1            At_AmbientCO2_HighNitrate_Fe            1  25388926
cnF_2             At_AmbientCO2_LowNitrate_Fe            1  29410725
cnF_1             At_AmbientCO2_LowNitrate_Fe            1  26720600
cnF_3             At_AmbientCO2_LowNitrate_Fe            1  26429519
CNF_1           At_ElevatedCO2_HighNitrate_Fe            1  16944203
CnF_2            At_ElevatedCO2_LowNitrate_Fe            1  28518287
CnF_1            At_ElevatedCO2_LowNitrate_Fe            1  30134658
CnF_3            At_ElevatedCO2_LowNitrate_Fe            1  30939161
cNf_1  At_AmbientCO2_HighNitrate_FeStarvation            1  30097179
cnf_2   At_AmbientCO2_LowNitrate_FeStarvation            1  29411029
cnf_1   At_AmbientCO2_LowNitrate_FeStarvation            1  32417299
cNf_2  At_AmbientCO2_HighNitrate_FeStarvation            1  34497190
cNf_3  At_AmbientCO2_HighNitrate_FeStarvation            1  31947050
cnf_3   At_AmbientCO2_LowNitrate_FeStarvation            1  30166724
Cnf_3  At_ElevatedCO2_LowNitrate_FeStarvation            1  33780095
CNf_1 At_ElevatedCO2_HighNitrate_FeStarvation            1  30427501
Cnf_1  At_ElevatedCO2_LowNitrate_FeStarvation            1  29577130
Cnf_2  At_ElevatedCO2_LowNitrate_FeStarvation            1  30405383
CNf_3 At_ElevatedCO2_HighNitrate_FeStarvation            1  25049240
CNf_2 At_ElevatedCO2_HighNitrate_FeStarvation            1  28496520
CNF_3           At_ElevatedCO2_HighNitrate_Fe            1  18823627
CNF_2           At_ElevatedCO2_HighNitrate_Fe            1  20700711
    cNF_3     cNF_2     cNF_1     cnF_2     cnF_1     cnF_3     CNF_1     CnF_2 
0.9662816 0.9730845 0.8976118 0.9973121 1.0475825 1.0072335 0.9752235 1.0522596 
    CnF_1     CnF_3     cNf_1     cnf_2     cnf_1     cNf_2     cNf_3     cnf_3 
1.0186883 1.0510472 1.0019289 0.9977532 1.0188031 1.0400636 0.9948079 1.0087956 
    Cnf_3     CNf_1     Cnf_1     Cnf_2     CNf_3     CNf_2     CNF_3     CNF_2 
0.9381444 0.9961312 1.0067383 1.0059904 1.0542699 1.0349814 0.9249584 0.9903091 
   user  system elapsed 
 10.745   0.695  11.456 

[1] "13655  genes DE"
    gene_id a.value m.value p.value q.value rank estimatedDEG
1 AT1G02850      NA      NA       0       0   45            1
2 AT1G03870      NA      NA       0       0   45            1
3 AT1G05660      NA      NA       0       0   45            1
4 AT1G06120      NA      NA       0       0   45            1
5 AT1G09932      NA      NA       0       0   45            1
6 AT1G12030      NA      NA       0       0   45            1

Il semble que l’effet du fer soit le plus fort, et celui qui amplifie les autres effets. La technique utilisée ici identifie des DEG globalement, sans séparer lesquels sont dûs à quel effet.

Clustering sur les gènes DE

On utilise des modèles de mélange pour regrouper les gènes ayant des profils d’expression similaires dans les différentes conditions.

AT1G02850 AT1G03870 AT1G05660 AT1G06120 AT1G09930 AT1G12000 AT1G12050 AT1G12150 
    10061     89198      6423     10592      3134    131939     30686      1084 
AT1G13480 AT1G13590 AT1G17145 AT1G17150 AT1G26270 AT1G27050 AT1G43690 AT1G51710 
    43476     17469     20056        18     74275      6989     53707     85286 
AT1G51940 AT1G51950 AT1G52760 AT1G56250 AT1G65410 AT1G65870 AT1G69640 AT1G73030 
     7530     17490    142691      2919     10285        39     35266     69999 
AT1G74400 AT2G03710 AT2G13560 AT2G18350 AT2G21260 AT2G26290 AT2G29100 AT2G29730 
      680       249    171475      4716       140     23798      1005     20918 
AT2G30360 AT2G31910 AT2G38680 AT2G43220 AT3G03272 AT3G22420 AT3G26450 AT3G27900 
      957       220      5616      3733        84      1695      9460       289 
AT3G44550 AT3G44630 AT3G51360 AT3G52970 AT3G55485 AT3G56460 AT4G01335 AT4G11830 
    76049      9287     21040       207       146     47385       206     87385 
AT4G11850 AT4G11860 AT4G14790 AT4G15830 AT4G16515 AT4G21105 AT4G21850 AT4G24590 
   144744     35429     13142      8702        21     97849     81045     25035 
AT4G31360 AT4G32200 AT4G32340 AT4G32915 AT4G33160 AT5G04560 AT5G06100 AT5G10560 
     5780      5405       609      8515      6540     70613     16947     35020 
AT5G12330 AT5G13080 AT5G13090 AT5G16990 AT5G19560 AT5G23360 AT5G26670 AT5G27460 
    32944     15900     11540      3828      5193      2763     13725      9187 
AT5G37980 AT5G38200 AT5G38540 
     1072      8758     19441 
 [ reached getOption("max.print") -- omitted 13580 entries ]
****************************************
coseq analysis: Poisson approach & none transformation
K = 4 to 12 
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 6383.35560997041"
[1] "Log-like diff: 2541.07674853667"
[1] "Log-like diff: 3138.40011106496"
[1] "Log-like diff: 678.272267739288"
[1] "Log-like diff: 1066.10599923997"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 5400.25029148319"
[1] "Log-like diff: 5241.9010062319"
[1] "Log-like diff: 4685.38879778667"
[1] "Log-like diff: 3580.50546012803"
[1] "Log-like diff: 4811.55846886488"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1140.31426327848"
[1] "Log-like diff: 411.787293101512"
[1] "Log-like diff: 851.360266454866"
[1] "Log-like diff: 1793.95281200853"
[1] "Log-like diff: 3187.63748669865"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 2683.8750449631"
[1] "Log-like diff: 1201.68394211084"
[1] "Log-like diff: 405.376369060642"
[1] "Log-like diff: 447.164055201859"
[1] "Log-like diff: 309.386561233669"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1157.93080280741"
[1] "Log-like diff: 399.753483835894"
[1] "Log-like diff: 517.710068453152"
[1] "Log-like diff: 376.842390858619"
[1] "Log-like diff: 307.473272700884"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 427.711994224773"
[1] "Log-like diff: 104.220355519937"
[1] "Log-like diff: 346.890544544054"
[1] "Log-like diff: 275.908685445025"
[1] "Log-like diff: 90.9370312869343"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 798.022501318409"
[1] "Log-like diff: 329.548169363579"
[1] "Log-like diff: 18.6276446152134"
[1] "Log-like diff: 217.089872419333"
[1] "Log-like diff: 402.178479307672"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1748.3185841292"
[1] "Log-like diff: 2485.73678952775"
[1] "Log-like diff: 862.45351058019"
[1] "Log-like diff: 999.852019477722"
[1] "Log-like diff: 3389.9291006394"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 951.255640789051"
[1] "Log-like diff: 849.651933918463"
[1] "Log-like diff: 918.754379816341"
[1] "Log-like diff: 1429.19576522524"
[1] "Log-like diff: 2279.99055144227"
$logLike


$ICL


$profiles


$boxplots


$probapost_boxplots


$probapost_barplots


$probapost_histogram

*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -9323707
*************************************************
Number of clusters = 12
ICL = -9323707
*************************************************
Cluster sizes:
 Cluster 1  Cluster 2  Cluster 3  Cluster 4  Cluster 5  Cluster 6  Cluster 7 
       403        962        360        195       3735       4105       1168 
 Cluster 8  Cluster 9 Cluster 10 Cluster 11 Cluster 12 
      1005        102        486        788        346 

Number of observations with MAP > 0.90 (% of total):
11829 (86.63%)

Number of observations with MAP > 0.90 per cluster (% of total per cluster):
 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
 349       849       265       160       3275      3591      1058     
 (86.6%)   (88.25%)  (73.61%)  (82.05%)  (87.68%)  (87.48%)  (90.58%) 
 Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
 842       76        408        671        285       
 (83.78%)  (74.51%)  (83.95%)   (85.15%)   (82.37%)  

ACP

On représente le clustering dans la plan principal d’une ACP

Class: pca dudi
Call: dudi.pca(df = log(data[, colnames(data) != "cluster"] + 0.1), 
    center = TRUE, scale = TRUE, scannf = FALSE, nf = 4)

Total inertia: 24

Eigenvalues:
     Ax1      Ax2      Ax3      Ax4      Ax5 
22.60009  0.58171  0.12401  0.07288  0.04945 

Projected inertia (%):
    Ax1     Ax2     Ax3     Ax4     Ax5 
94.1671  2.4238  0.5167  0.3037  0.2061 

Cumulative projected inertia (%):
    Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
  94.17   96.59   97.11   97.41   97.62 

(Only 5 dimensions (out of 24) are shown)

Il semblerait que l’ACP détecte dans un premier temps (avec le premier vecteur principal) la valeur moyenne d’expression, puis l’expression diff induite par le fer.

On peut faire des représentations dans le plan des seconds et troisième axes principaux (le premier traduit le fer, le second la carence en nitrate).

Sur le cercle de corrélations dans le plan 2 et 3, on voit bien que lorsque la carence fer est là, on peut différencier un effet nitrate. Le CO2 est, lui encore difficile à identifier, le 4eme axe de l’ACP ne renseignant pas beaucoup…

Effet Nitrate en conditions normales

Design matrix not provided. Switch to the classic mode.

       cNF-cnF
Down       472
NotSig   10019
Up        1237

[1] "ON SELECTIONNE AU TOTAL :"
[1] "1709 genes with fdr <  0.01"

On retrouve une majorité de gènes activés par le fort nitrate. Les résultats sont un peu similaires à ceux du jeu de données d’entrainement de Nature entre témoin au KCl et Traitement au NO3.

Ontologies

Effet CO2

En conditions normales

Design matrix not provided. Switch to the classic mode.

       CNF-cNF
Down        25
NotSig   11597
Up          10

[1] "ON SELECTIONNE AU TOTAL :"
[1] "35 genes with fdr <  0.01"

En carence de nitrate

Design matrix not provided. Switch to the classic mode.

       CnF-cnF
Down       404
NotSig   10878
Up         491

[1] "ON SELECTIONNE AU TOTAL :"
[1] "895 genes with fdr <  0.01"

En carence en fer

Design matrix not provided. Switch to the classic mode.

       CNf-cNf
Down       646
NotSig   10560
Up         736

[1] "ON SELECTIONNE AU TOTAL :"
[1] "1382 genes with fdr <  0.01"

En carence des deux

Design matrix not provided. Switch to the classic mode.

       Cnf-cnf
Down       375
NotSig   11004
Up         587

[1] "ON SELECTIONNE AU TOTAL :"
[1] "962 genes with fdr <  0.01"

Effet Fer en condtions normales

Design matrix not provided. Switch to the classic mode.

       cNF-cNf
Down      2125
NotSig    7515
Up        2329

[1] "ON SELECTIONNE AU TOTAL :"
[1] "4454 genes with fdr <  0.01"

Enrichement on ontologies des clusters

A faire maybe soon

 

A work by Océane Cassan

oceane.cassan@supagro.fr